Speaker adaptation using constrained estimation of Gaussian mixtures
نویسندگان
چکیده
A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMM’s). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. Performance degrades dramatically when the user is radically different from the training population. A popular technique that can improve the performance and robustness of a speech recognition system is adapting speech models to the speaker, and more generally to the channel and the task. In continuous mixture-density HMM’s the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, we propose a constrained estimation technique for Gaussian mixture densities. The algorithm is evaluated on the large-vocabulary Wall Street Journal corpus for both native and nonnative speakers of American English. For nonnative speakers, the recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speakerindependent accuracy achieved for native speakers. For native speakers, the recognition performance after adaptation improves to the accuracy of speaker-dependent systems that use six times as much training data.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملPrincipal mixture speaker adaptation for improved continuous speech recognition
Nowadays, almost all speaker-independent (SI) speech recognition systems use CDHMM with multivariate mixture Gaussian as observation density to cover speaker variabilities. It has been shown that given sufficient training data, the more mixtures are used in the HMM observation density, the better the system’s perform. However, acoustic HMM with more Gaussian densities is more complex and slows ...
متن کاملText Independent Speaker Identification Using Automatic Acoustic Segmentation
This paper describes an acoustic class dependent technique for text independent speaker identification on very short utterances. The technique is based on maximum likelihood estimation of a Gaussian mixture model representation of speaker identity. Gaussian mixtures are noted for their robustness as a parametric model and their ability to form smooth estimates of rather arbitrary underlying den...
متن کاملManifold Constrained Finite Gaussian Mixtures
In many practical applications, the data is organized along a manifold of lower dimension than the dimension of the embedding space. This additional information can be used when learning the model parameters of Gaussian mixtures. Based on a mismatch measure between the Euclidian and the geodesic distance, manifold constrained responsibilities are introduced. Experiments in density estimation sh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Speech and Audio Processing
دوره 3 شماره
صفحات -
تاریخ انتشار 1995